Rank | Count | Beginning |
---|---|---|
167516 | 10159 | Pēc |
244274 | 5115 | Tās |
233869 | 5008 | Tā |
146091 | 3857 | No |
287472 | 2997 | Viņš |
277416 | 2984 | Vēsture |
120487 | 2117 | Līdz |
284432 | 2053 | Viņa |
158783 | 2038 | Par |
220146 | 1900 | Šīs |
113065 | 1826 | Lai |
204951 | 1723 | Šajā |
217027 | 1635 | Šī |
106311 | 1555 | Kopš |
18179 | 1490 | Ar |
274051 | 1404 | Vēlāk |
35486 | 1376 | Biogrāfija |
258160 | 1370 | Tomēr |
257235 | 1332 | To |
116990 | 1301 | Latvijas |
14846 | 1298 | Apdzīvotā |
238383 | 1272 | Tajā |
19146 | 1227 | Arī |
139480 | 1209 | Mūsdienas |
266416 | 1184 | Uz |
87161 | 1131 | Ja |
164125 | 1127 | Pašlaik |
242373 | 1074 | Tāpat |
93068 | 999 | Kad |
183538 | 924 | Pilsēta |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV